Conversation
There was a problem hiding this comment.
14 issues found across 65 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/image.c">
<violation number="1" location="src/image.c:1067">
P3: `munmap(elf_buf, elf_buf_len)` is called when `kbox_read_elf_header_window_fd` failed. At that point `elf_buf` is still `NULL` (never assigned by the failing function). The `munmap(NULL, 0)` call is a no-op on Linux but is dead/incorrect code — remove it or guard with `if (elf_buf)`.</violation>
</file>
<file name="src/loader-image.c">
<violation number="1" location="src/loader-image.c:61">
P2: `fail:` path discards the specific `rc` error code and always returns `-1`. Since `rc` is set to meaningful negative errno values at each failure site, this should be `return rc;` to preserve diagnostic information for callers.</violation>
</file>
<file name="src/fd-table.c">
<violation number="1" location="src/fd-table.c:151">
P3: `kbox_fd_table_insert_hostonly` is newly added but has no call sites, so it is dead code and can drift from the real host-only allocation path.</violation>
</file>
<file name="src/probe.c">
<violation number="1" location="src/probe.c:181">
P2: Check `waitpid` return status in the seccomp filter probe; failures/interruption can currently be misclassified as success.</violation>
</file>
<file name="tests/unit/test-syscall-trap.c">
<violation number="1" location="tests/unit/test-syscall-trap.c:451">
P2: Avoid an unbounded blocking `read()` in the test; add a bounded wait so failures don't hang CI.</violation>
<violation number="2" location="tests/unit/test-syscall-trap.c:478">
P2: Poll `has_pending_dispatch` with an atomic load; plain reads race the service thread.</violation>
</file>
<file name="tests/guest/trap-bench.S">
<violation number="1" location="tests/guest/trap-bench.S:18">
P2: The benchmark loop adds per-iteration `push/pop` overhead inside the timed section, which skews the reported getpid latency.</violation>
</file>
<file name="src/procmem.c">
<violation number="1" location="src/procmem.c:141">
P2: `close(fd)` between a failed `pwrite` and the `errno` read can clobber the original error code. Save `errno` before calling `close()`.</violation>
</file>
<file name="src/syscall-trap.c">
<violation number="1" location="src/syscall-trap.c:165">
P0: Incorrect argument register assignment in `rt_sigprocmask` assembly: `sigset_size` is placed in `%rdx` (oldset) instead of `%r10` (sigsetsize), and `%ecx` is zeroed instead of `%r10`. The kernel's 4th syscall argument is `%r10`, not `%rcx`. This will either EINVAL (stale `%r10`) or crash by writing the old sigmask to address `sigset_size`.</violation>
<violation number="2" location="src/syscall-trap.c:438">
P0: Same argument-shuffle bug on aarch64: `sigset_size` is placed in `x2` (oldset) instead of `x3` (sigsetsize), and `x3` is zeroed. This will crash or EINVAL.</violation>
<violation number="3" location="src/syscall-trap.c:998">
P2: When `service_running` is true, `trap_host_write` returns raw kernel error codes (e.g., `-EAGAIN`) without setting the C `errno`. The subsequent `errno == EAGAIN` check is only correct for the libc `write()` path; the raw-syscall path will incorrectly return -1 on a benign `EAGAIN`.</violation>
</file>
<file name="tests/unit/test-rewrite.c">
<violation number="1" location="tests/unit/test-rewrite.c:930">
P2: The classification test is ineffective because it expects the same result for known and unknown syscall numbers, so it cannot catch a broken implementation.</violation>
</file>
<file name="docs/gdb-workflow.md">
<violation number="1" location="docs/gdb-workflow.md:210">
P3: The new dispatch breakpoint description is inaccurate: `kbox-syscall-trace` currently breaks on `kbox_dispatch_syscall` only, not `kbox_dispatch_request`.</violation>
</file>
<file name="src/rewrite.c">
<violation number="1" location="src/rewrite.c:1">
P1: `current[4]` is too small for x86_64 wrapper patches (width 8). The `width > sizeof(current)` guard will reject every x86_64 site, making this function silently fail for x86_64 binaries. Other memfd patching functions in this file use `KBOX_REWRITE_MAX_PATCH_BYTES` for this buffer.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| "mov %rsi, %rdx\n" | ||
| "mov $" XSTR(FUTEX_WAIT_PRIVATE) ", %esi\n" | ||
| "xor %r10d, %r10d\n" | ||
| "xor %r8d, %r8d\n" | ||
| "xor %r9d, %r9d\n" |
There was a problem hiding this comment.
P0: Incorrect argument register assignment in rt_sigprocmask assembly: sigset_size is placed in %rdx (oldset) instead of %r10 (sigsetsize), and %ecx is zeroed instead of %r10. The kernel's 4th syscall argument is %r10, not %rcx. This will either EINVAL (stale %r10) or crash by writing the old sigmask to address sigset_size.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall-trap.c, line 165:
<comment>Incorrect argument register assignment in `rt_sigprocmask` assembly: `sigset_size` is placed in `%rdx` (oldset) instead of `%r10` (sigsetsize), and `%ecx` is zeroed instead of `%r10`. The kernel's 4th syscall argument is `%r10`, not `%rcx`. This will either EINVAL (stale `%r10`) or crash by writing the old sigmask to address `sigset_size`.</comment>
<file context>
@@ -0,0 +1,1534 @@
+ ".globl kbox_syscall_trap_host_futex_wait_start\n"
+ "kbox_syscall_trap_host_futex_wait_start:\n"
+ "kbox_syscall_trap_host_futex_wait_private:\n"
+ "mov %rsi, %rdx\n"
+ "mov $" XSTR(FUTEX_WAIT_PRIVATE) ", %esi\n"
+ "xor %r10d, %r10d\n"
</file context>
| "mov %rsi, %rdx\n" | |
| "mov $" XSTR(FUTEX_WAIT_PRIVATE) ", %esi\n" | |
| "xor %r10d, %r10d\n" | |
| "xor %r8d, %r8d\n" | |
| "xor %r9d, %r9d\n" | |
| "mov %rsi, %r10\n" | |
| "mov %rdi, %rsi\n" | |
| "xor %edx, %edx\n" | |
| "xor %r8d, %r8d\n" | |
| "xor %r9d, %r9d\n" |
| size_t size) | ||
| { | ||
| if (!image || image->region_count >= KBOX_LOADER_MAX_MAPPINGS) | ||
| return -1; |
There was a problem hiding this comment.
P2: fail: path discards the specific rc error code and always returns -1. Since rc is set to meaningful negative errno values at each failure site, this should be return rc; to preserve diagnostic information for callers.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/loader-image.c, line 61:
<comment>`fail:` path discards the specific `rc` error code and always returns `-1`. Since `rc` is set to meaningful negative errno values at each failure site, this should be `return rc;` to preserve diagnostic information for callers.</comment>
<file context>
@@ -0,0 +1,186 @@
+ size_t size)
+{
+ if (!image || image->region_count >= KBOX_LOADER_MAX_MAPPINGS)
+ return -1;
+ image->regions[image->region_count++] = (struct kbox_loader_image_region) {
+ .addr = addr,
</file context>
| return -1; | |
| return rc; |
caf83d2 to
07915f3
Compare
There was a problem hiding this comment.
8 issues found across 71 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/loader-image.c">
<violation number="1" location="src/loader-image.c:150">
P2: Memory leak: if `record_region` fails, the mapping from `map_region_exact` is never recorded and the `fail` cleanup path (`unmap_recorded_regions`) won't unmap it. Add `munmap(mapped, mapping->size)` before the `goto fail`.</violation>
</file>
<file name="src/elf.c">
<violation number="1" location="src/elf.c:328">
P3: Redundant `memcpy` — `pread_full` on the next line re-reads from offset 0 into the same buffer, overwriting everything that was just copied. Remove the `memcpy` to avoid up to 256 KB of unnecessary work (especially relevant since the comment notes this path can execute inside a SIGSYS handler).</violation>
</file>
<file name="tests/unit/test-syscall-trap.c">
<violation number="1" location="tests/unit/test-syscall-trap.c:447">
P2: Initialize ctx.host_nrs before starting the service thread; the dispatch path dereferences ctx->host_nrs and will crash with the zeroed ctx.</violation>
</file>
<file name="src/procmem.c">
<violation number="1" location="src/procmem.c:97">
P2: Stale `errno` when `pread` returns 0: `errno` is not cleared on success/EOF, so `errno ? -errno : -EIO` may return a stale error code from a prior failed call instead of the intended `-EIO`. Split the check into `n < 0` (use errno) and `n == 0` (return -EIO directly).</violation>
</file>
<file name="src/rewrite.c">
<violation number="1" location="src/rewrite.c:1">
P2: Unsigned underflow in bounds check: `image_len - 8` wraps to `SIZE_MAX - 7` when `image_len < 8`, silently bypassing the guard. Replace with an addition-based comparison to avoid the wraparound.</violation>
</file>
<file name="src/syscall-trap.c">
<violation number="1" location="src/syscall-trap.c:1113">
P2: `service_stop` is read/written across threads without atomics, unlike the other shared flags in this struct (`has_pending_request`, `has_pending_dispatch`) which use `__atomic_load_n`/`__atomic_store_n`. The compiler may hoist the non-atomic load out of the service-thread loop, preventing the thread from seeing the stop flag.</violation>
</file>
<file name="tests/guest/bench-test.c">
<violation number="1" location="tests/guest/bench-test.c:31">
P2: The benchmark silently measures the error path if stat() fails.</violation>
<violation number="2" location="tests/guest/bench-test.c:46">
P2: The benchmark silently ignores open() failures, which can result in measuring the error path and outputting deceptively fast numbers.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| &mapped); | ||
| if (rc < 0) | ||
| goto fail; | ||
| if (record_region(image, mapped, (size_t) mapping->size) < 0) { |
There was a problem hiding this comment.
P2: Memory leak: if record_region fails, the mapping from map_region_exact is never recorded and the fail cleanup path (unmap_recorded_regions) won't unmap it. Add munmap(mapped, mapping->size) before the goto fail.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/loader-image.c, line 150:
<comment>Memory leak: if `record_region` fails, the mapping from `map_region_exact` is never recorded and the `fail` cleanup path (`unmap_recorded_regions`) won't unmap it. Add `munmap(mapped, mapping->size)` before the `goto fail`.</comment>
<file context>
@@ -0,0 +1,186 @@
+ &mapped);
+ if (rc < 0)
+ goto fail;
+ if (record_region(image, mapped, (size_t) mapping->size) < 0) {
+ rc = -ENOMEM;
+ goto fail;
</file context>
| ASSERT_EQ( | ||
| kbox_syscall_trap_runtime_init(&runtime, &ctx, &capture_only_trap_ops), | ||
| 0); | ||
| ASSERT_EQ(kbox_syscall_trap_runtime_service_start(&runtime), 0); |
There was a problem hiding this comment.
P2: Initialize ctx.host_nrs before starting the service thread; the dispatch path dereferences ctx->host_nrs and will crash with the zeroed ctx.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/unit/test-syscall-trap.c, line 447:
<comment>Initialize ctx.host_nrs before starting the service thread; the dispatch path dereferences ctx->host_nrs and will crash with the zeroed ctx.</comment>
<file context>
@@ -0,0 +1,512 @@
+ ASSERT_EQ(
+ kbox_syscall_trap_runtime_init(&runtime, &ctx, &capture_only_trap_ops),
+ 0);
+ ASSERT_EQ(kbox_syscall_trap_runtime_service_start(&runtime), 0);
+ ASSERT_EQ(kbox_syscall_trap_runtime_capture(&runtime, &req), 0);
+
</file context>
| @@ -0,0 +1,4060 @@ | |||
| /* SPDX-License-Identifier: MIT */ | |||
There was a problem hiding this comment.
P2: Unsigned underflow in bounds check: image_len - 8 wraps to SIZE_MAX - 7 when image_len < 8, silently bypassing the guard. Replace with an addition-based comparison to avoid the wraparound.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/rewrite.c:
<comment>Unsigned underflow in bounds check: `image_len - 8` wraps to `SIZE_MAX - 7` when `image_len < 8`, silently bypassing the guard. Replace with an addition-based comparison to avoid the wraparound.</comment>
18c2e70 to
5c09f57
Compare
There was a problem hiding this comment.
4 issues found across 71 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/loader-layout.c">
<violation number="1" location="src/loader-layout.c:76">
P2: Missing overflow checks for `load_bias + seg->map_start` (and similar `load_bias +` additions at lines 99, 103, 113). The function uses `__builtin_add_overflow` for `seg->vaddr` arithmetic but performs these bias additions unchecked, which could wrap around with a large load bias. Consider adding overflow checks for consistency with the existing pattern.</violation>
</file>
<file name="src/seccomp-bpf.c">
<violation number="1" location="src/seccomp-bpf.c:672">
P1: Bounds check on `filter[]` runs after all writes complete — if `idx` ever exceeds `MAX_PROG_LEN` during emission, the stack is already corrupted. Move the overflow check into the emit helpers or add a guard before each `filter[idx++]` write, or at minimum assert `idx < MAX_PROG_LEN` at the start of each emit loop iteration.</violation>
</file>
<file name="src/syscall-trap.c">
<violation number="1" location="src/syscall-trap.c:1232">
P0: On x86_64, `host_syscall` reads the syscall number from `REG_RAX` in the ucontext, but the kernel has already overwritten RAX with `-ENOSYS` before delivering SIGSYS for `SECCOMP_RET_TRAP`. This means every CONTINUE-dispatched syscall in trap mode will execute `syscall(-ENOSYS)` instead of the intended syscall. The request struct built earlier by `kbox_syscall_regs_from_sigsys` has the correct number (from `si_syscall`), but `host_syscall` doesn't use it.</violation>
</file>
<file name="tests/unit/test-syscall-trap.c">
<violation number="1" location="tests/unit/test-syscall-trap.c:358">
P2: Initialize `ctx.host_nrs` before calling `kbox_syscall_dispatch_sigsys`; the dispatch path dereferences it unconditionally, so leaving it NULL will crash this test.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| static int64_t host_syscall(const ucontext_t *uc) | ||
| { | ||
| #if defined(__x86_64__) | ||
| long nr = uc->uc_mcontext.gregs[REG_RAX]; |
There was a problem hiding this comment.
P0: On x86_64, host_syscall reads the syscall number from REG_RAX in the ucontext, but the kernel has already overwritten RAX with -ENOSYS before delivering SIGSYS for SECCOMP_RET_TRAP. This means every CONTINUE-dispatched syscall in trap mode will execute syscall(-ENOSYS) instead of the intended syscall. The request struct built earlier by kbox_syscall_regs_from_sigsys has the correct number (from si_syscall), but host_syscall doesn't use it.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall-trap.c, line 1232:
<comment>On x86_64, `host_syscall` reads the syscall number from `REG_RAX` in the ucontext, but the kernel has already overwritten RAX with `-ENOSYS` before delivering SIGSYS for `SECCOMP_RET_TRAP`. This means every CONTINUE-dispatched syscall in trap mode will execute `syscall(-ENOSYS)` instead of the intended syscall. The request struct built earlier by `kbox_syscall_regs_from_sigsys` has the correct number (from `si_syscall`), but `host_syscall` doesn't use it.</comment>
<file context>
@@ -0,0 +1,1542 @@
+static int64_t host_syscall(const ucontext_t *uc)
+{
+#if defined(__x86_64__)
+ long nr = uc->uc_mcontext.gregs[REG_RAX];
+
+ if (nr == __NR_exit || nr == __NR_exit_group)
</file context>
| filter[idx++] = (struct kbox_sock_filter) KBOX_BPF_STMT( | ||
| KBOX_BPF_RET | KBOX_BPF_K, KBOX_SECCOMP_RET_TRAP); | ||
|
|
||
| if (idx > MAX_PROG_LEN) { |
There was a problem hiding this comment.
P1: Bounds check on filter[] runs after all writes complete — if idx ever exceeds MAX_PROG_LEN during emission, the stack is already corrupted. Move the overflow check into the emit helpers or add a guard before each filter[idx++] write, or at minimum assert idx < MAX_PROG_LEN at the start of each emit loop iteration.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/seccomp-bpf.c, line 672:
<comment>Bounds check on `filter[]` runs after all writes complete — if `idx` ever exceeds `MAX_PROG_LEN` during emission, the stack is already corrupted. Move the overflow check into the emit helpers or add a guard before each `filter[idx++]` write, or at minimum assert `idx < MAX_PROG_LEN` at the start of each emit loop iteration.</comment>
<file context>
@@ -334,15 +492,214 @@ int kbox_install_seccomp_listener(const struct kbox_host_nrs *h)
+ filter[idx++] = (struct kbox_sock_filter) KBOX_BPF_STMT(
+ KBOX_BPF_RET | KBOX_BPF_K, KBOX_SECCOMP_RET_TRAP);
+
+ if (idx > MAX_PROG_LEN) {
+ errno = EINVAL;
+ return -1;
</file context>
| info.si_signo = SIGSYS; | ||
| #endif | ||
|
|
||
| ASSERT_EQ(kbox_syscall_dispatch_sigsys(&ctx, 55, &info, &uc), expected_rc); |
There was a problem hiding this comment.
P2: Initialize ctx.host_nrs before calling kbox_syscall_dispatch_sigsys; the dispatch path dereferences it unconditionally, so leaving it NULL will crash this test.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/unit/test-syscall-trap.c, line 358:
<comment>Initialize `ctx.host_nrs` before calling `kbox_syscall_dispatch_sigsys`; the dispatch path dereferences it unconditionally, so leaving it NULL will crash this test.</comment>
<file context>
@@ -0,0 +1,517 @@
+ info.si_signo = SIGSYS;
+#endif
+
+ ASSERT_EQ(kbox_syscall_dispatch_sigsys(&ctx, 55, &info, &uc), expected_rc);
+#if defined(__x86_64__)
+ ASSERT_EQ(uc.uc_mcontext.gregs[REG_RAX], info.si_syscall);
</file context>
This implements syscall rewriting fast-path, adding a complete binary
rewriting layer alongside the existing seccomp-unotify (Tier 3) and
SIGSYS trap (Tier 1) interception paths.
Core additions:
- x86-64 instruction length decoder (x86-decode.c) that walks true
instruction boundaries, eliminating false-positive 0F 05/0F 34
matches inside immediates and displacements
- Site-aware rewrite classification (WRAPPER vs COMPLEX) to gate the
inline fast path: only simple syscall+ret wrapper sites return
virtualized PID values (getpid=1, gettid=1, getppid=0) directly;
complex sites (raise->gettid->tgkill) use full dispatch
- aarch64 binary rewriting: SVC #0 patched to B-branch trampolines
with veneer pages (LDR+BR indirect stubs) for sites beyond 128MB
- Userspace ELF loader for in-process trap/rewrite launch
- Unified syscall request abstraction across all three tiers
- Per-architecture AUTO mode: seccomp on x86_64, rewrite/trap on
aarch64 for non-shell commands without fork wrappers
- W^X enforcement (PROT_WRITE|PROT_EXEC blocked in trap/rewrite mode)
- CLONE_THREAD blocked in trap/rewrite (EPERM); multi-threaded guests
must use --syscall-mode=seccomp
- Shadow FD cache with dup() per open for independent file offsets
- Safe argv/envp validation in trap-mode execve via process_vm_readv
- Execve re-translation: rewrite runtime re-installed on new binary
- Fork-site scanner for AUTO mode safety (detects fork/clone wrappers)
Benchmark (bench-test, 200 iterations, release build, us/call):
x86_64 (AMD Threadripper 2990WX, Linux 6.8.0):
syscall native seccomp trap rewrite auto
stat 2.6 13.5 22.0 20.2 13.5
open+close 5.4 25.2 89.6 58.9 25.3
lseek+read 1.9 2.3 2.8 2.1 2.4
write 2.7 2.2 2.5 2.0 2.4
getpid 0.3 0.0 0.1 0.1 0.0
aarch64 (Marvell ThunderX2, Linux 6.14.0):
syscall native seccomp trap rewrite auto
stat 1.5 21.7 25.1 1.0 21.4
open+close 3.4 40.3 138.4 117.0 39.2
lseek+read 1.0 1.3 1.6 1.6 1.4
write 1.1 1.4 1.8 1.9 1.4
getpid 0.3 0.0 0.1 0.1 0.0
* AUTO selects seccomp on x86_64 (lower USER_NOTIF overhead), rewrite on
aarch64 (stat 1.0us via in-process LKL inode cache vs 21.7us
* seccomp round-trip). aarch64 rewrite stat at 0.7x native is real: LKL
serves from memory without hitting the block layer.
Change-Id: I9052fc27388027b9147348c0998bc8fa4e1e123e
5c09f57 to
6080aad
Compare
This implements syscall rewriting fast-path, adding a complete binary rewriting layer alongside the existing seccomp-unotify (Tier 3) and SIGSYS trap (Tier 1) interception paths.
Core additions:
Benchmark (bench-test, 200 iterations, release build, us/call):
Change-Id: I9052fc27388027b9147348c0998bc8fa4e1e123e